36 research outputs found

    From Questions to Effective Answers: On the Utility of Knowledge-Driven Querying Systems for Life Sciences Data

    Get PDF
    We compare two distinct approaches for querying data in the context of the life sciences. The first approach utilizes conventional databases to store the data and intuitive form-based interfaces to facilitate easy querying of the data. These interfaces could be seen as implementing a set of "pre-canned" queries commonly used by the life science researchers that we study. The second approach is based on semantic Web technologies and is knowledge (model) driven. It utilizes a large OWL ontology and same datasets as before but associated as RDF instances of the ontology concepts. An intuitive interface is provided that allows the formulation of RDF triples-based queries. Both these approaches are being used in parallel by a team of cell biologists in their daily research activities, with the objective of gradually replacing the conventional approach with the knowledge-driven one. This provides us with a valuable opportunity to compare and qualitatively evaluate the two approaches. We describe several benefits of the knowledge-driven approach in comparison to the traditional way of accessing data, and highlight a few limitations as well. We believe that our analysis not only explicitly highlights the specific benefits and limitations of semantic Web technologies in our context but also contributes toward effective ways of translating a question in a researcher's mind into precise computational queries with the intent of obtaining effective answers from the data. While researchers often assume the benefits of semantic Web technologies, we explicitly illustrate these in practice

    A Semantic Problem Solving Environment for Integrative Parasite Research: Identification of Intervention Targets for Trypanosoma cruzi

    Get PDF
    Effective research in parasite biology requires analyzing experimental lab data in the context of constantly expanding public data resources. Integrating lab data with public resources is particularly difficult for biologists who may not possess significant computational skills to acquire and process heterogeneous data stored at different locations. Therefore, we develop a semantic problem solving environment (SPSE) that allows parasitologists to query their lab data integrated with public resources using ontologies. An ontology specifies a common vocabulary and formal relationships among the terms that describe an organism, and experimental data and processes in this case. SPSE supports capturing and querying provenance information, which is metadata on the experimental processes and data recorded for reproducibility, and includes a visual query-processing tool to formulate complex queries without learning the query language syntax. We demonstrate the significance of SPSE in identifying gene knockout targets for T. cruzi. The overall goal of SPSE is to help researchers discover new or existing knowledge that is implicitly present in the data but not always easily detected. Results demonstrate improved usefulness of SPSE over existing lab systems and approaches, and support for complex query design that is otherwise difficult to achieve without the knowledge of query language syntax

    The steady-state transcriptome of the four major life-cycle stages of Trypanosoma cruzi

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Chronic chagasic cardiomyopathy is a debilitating and frequently fatal outcome of human infection with the protozoan parasite, <it>Trypanosoma cruzi</it>. Microarray analysis of gene expression during the <it>T. cruzi </it>life-cycle could be a valuable means of identifying drug and vaccine targets based on their appropriate expression patterns, but results from previous microarray studies in <it>T. cruzi </it>and related kinetoplastid parasites have suggested that the transcript abundances of most genes in these organisms do not vary significantly between life-cycle stages.</p> <p>Results</p> <p>In this study, we used whole genome, oligonucleotide microarrays to globally determine the extent to which <it>T. cruzi </it>regulates mRNA relative abundances over the course of its complete life-cycle. In contrast to previous microarray studies in kinetoplastids, we observed that relative transcript abundances for over 50% of the genes detected on the <it>T. cruzi </it>microarrays were significantly regulated during the <it>T. cruzi </it>life-cycle. The significant regulation of 25 of these genes was confirmed by quantitative reverse-transcriptase PCR (qRT-PCR). The <it>T. cruzi </it>transcriptome also mirrored published protein expression data for several functional groups. Among the differentially regulated genes were members of paralog clusters, nearly 10% of which showed divergent expression patterns between cluster members.</p> <p>Conclusion</p> <p>Taken together, these data support the conclusion that transcript abundance is an important level of gene expression regulation in <it>T. cruzi</it>. Thus, microarray analysis is a valuable screening tool for identifying stage-regulated <it>T. cruzi </it>genes and metabolic pathways.</p

    A unified framework for managing provenance information in translational research

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A critical aspect of the NIH <it>Translational Research </it>roadmap, which seeks to accelerate the delivery of "bench-side" discoveries to patient's "bedside," is the management of the <it>provenance </it>metadata that keeps track of the origin and history of data resources as they traverse the path from the bench to the bedside and back. A comprehensive provenance framework is essential for researchers to verify the quality of data, reproduce scientific results published in peer-reviewed literature, validate scientific process, and associate trust value with data and results. Traditional approaches to provenance management have focused on only partial sections of the translational research life cycle and they do not incorporate "domain semantics", which is essential to support domain-specific querying and analysis by scientists.</p> <p>Results</p> <p>We identify a common set of challenges in managing provenance information across the <it>pre-publication </it>and <it>post-publication </it>phases of data in the translational research lifecycle. We define the semantic provenance framework (SPF), underpinned by the Provenir upper-level provenance ontology, to address these challenges in the four stages of provenance metadata:</p> <p>(a) Provenance <b>collection </b>- during data generation</p> <p>(b) Provenance <b>representation </b>- to support interoperability, reasoning, and incorporate domain semantics</p> <p>(c) Provenance <b>storage </b>and <b>propagation </b>- to allow efficient storage and seamless propagation of provenance as the data is transferred across applications</p> <p>(d) Provenance <b>query </b>- to support queries with increasing complexity over large data size and also support knowledge discovery applications</p> <p>We apply the SPF to two exemplar translational research projects, namely the Semantic Problem Solving Environment for <it>Trypanosoma cruzi </it>(<it>T.cruzi </it>SPSE) and the Biomedical Knowledge Repository (BKR) project, to demonstrate its effectiveness.</p> <p>Conclusions</p> <p>The SPF provides a unified framework to effectively manage provenance of translational research data during pre and post-publication phases. This framework is underpinned by an upper-level provenance ontology called Provenir that is extended to create domain-specific provenance ontologies to facilitate provenance interoperability, seamless propagation of provenance, automated querying, and analysis.</p

    High Throughput Selection of Effective Serodiagnostics for Trypanosoma cruzi infection

    Get PDF
    The diagnosis of Trypanosoma cruzi infection (the cause of human Chagas disease) is difficult because the symptoms of the infection are often absent or non-specific, and because the parasites themselves are usually below the level of detection in the infected subjects. Therefore, diagnosis generally depends on the measurement of T. cruzi–specific antibodies produced in response to the infection. However, current methods to detect anti–T. cruzi antibodies are relatively poor. In this study, we have conducted a broad screen of >400 T. cruzi proteins to identify those proteins which are best able to detect anti–T. cruzi antibodies. Using a set of proteins selected by this screen, we were able to detect 100% of >100 confirmed positive human cases of T. cruzi infection, as well as suspect cases that were negative using existing tests. This protein panel was also able to detect apparent changes in infection status following drug treatment of individuals with chronic T. cruzi infection. The results of this study should allow for significant improvements in the detection of T. cruzi infection and better screening methods to avoid blood transfusion–related transmission of the infection, and offer a crucial tool for determining the success or failure of drug treatment and other intervention strategies to limit the impact of Chagas disease

    A Unified Framework fro Managing Provenance Information in Translational Research

    Get PDF
    Background A critical aspect of the NIH Translational Research roadmap, which seeks to accelerate the delivery of bench-side discoveries to patient\u27s bedside, is the management of the provenance metadata that keeps track of the origin and history of data resources as they traverse the path from the bench to the bedside and back. A comprehensive provenance framework is essential for researchers to verify the quality of data, reproduce scientific results published in peer-reviewed literature, validate scientific process, and associate trust value with data and results. Traditional approaches to provenance management have focused on only partial sections of the translational research life cycle and they do not incorporate domain semantics , which is essential to support domain-specific querying and analysis by scientists. Results We identify a common set of challenges in managing provenance information across the pre-publication and post-publication phases of data in the translational research lifecycle. We define the semantic provenance framework (SPF), underpinned by the Provenir upper-level provenance ontology, to address these challenges in the four stages of provenance metadata: (a) Provenance collection - during data generation (b) Provenance representation - to support interoperability, reasoning, and incorporate domain semantics (c) Provenance storage and propagation - to allow efficient storage and seamless propagation of provenance as the data is transferred across applications (d) Provenance query - to support queries with increasing complexity over large data size and also support knowledge discovery applications We apply the SPF to two exemplar translational research projects, namely the Semantic Problem Solving Environment for Trypanosoma cruzi (T.cruzi SPSE) and the Biomedical Knowledge Repository (BKR) project, to demonstrate its effectiveness. Conclusions The SPF provides a unified framework to effectively manage provenance of translational research data during pre and post-publication phases. This framework is underpinned by an upper-level provenance ontology called Provenir that is extended to create domain-specific provenance ontologies to facilitate provenance interoperability, seamless propagation of provenance, automated querying, and analysis

    The Knowledge-Driven Exploration of Integrated Biomedical Knowledge Sources Facilitates the Generation of New Hypotheses

    Get PDF
    Knowledge gained from the scientific literature can complement newly obtained experimental data in helping researchers understand the pathological processes underlying diseases. However, unless the scientific literature and experimental data are semantically integrated, it is generally difficult for scientists to exploit the two sources effectively. We argue that, in addition to the semantic integration of heterogeneous knowledge sources, the usability of the integrated resource by scientists is dependent upon the availability of knowledge visualization and exploration tools. Moreover, the integration techniques must be scalable and the exploration interfaces must be easy to use by bench scientists. The end goal of such integrated knowledge sources and exploration tools is to enable scientists to generate novel hypotheses from the knowledge they explore. We tested the feasibility of our approach on a real use case in the domain of human health and parasite biology. On the one hand, we integrated the experimental data generated as part of an on-going research on Chagas disease with the knowledge extracted from the PubMed articles, using Semantic Web technologies. On the other hand, we developed iExplore, a web tool with a graphical interface for interactive knowledge exploration, that allows non-technical users to explore the integrated knowledge base using a relationship-focused approach. We illustrate the effectiveness of our approach by describing the knowledge-driven process of using iExplore to generate a new hypothesis for the treatment of Chagas disease

    From Questions to Effective Answers: On the Utility of Knowledge-Driven Querying Systems for Life Sciences Data

    Get PDF
    We compare two distinct approaches for querying data in the context of the life sciences. The first approach utilizes conventional databases to store the data and intuitive form-based interfaces to facilitate easy querying of the data. These interfaces could be seen as implementing a set of \u27pre-canned\u27 queries commonly used by the life science researchers that we study. The second approach is based on semantic Web technologies and is knowledge (model) driven. It utilizes a large OWL ontology and same datasets as before but associated as RDF instances of the ontology concepts. An intuitive interface is provided that allows the formulation of RDF triples-based queries. Both these approaches are being used in parallel by a team of cell biologists in their daily research activities, with the objective of gradually replacing the conventional approach with the knowledge-driven one. This provides us with a valuable opportunity to compare and qualitatively evaluate the two approaches. We describe several benefits of the knowledge-driven approach in comparison to the traditional way of accessing data, and highlight a few limitations as well. We believe that our analysis not only explicitly highlights the specific benefits and limitations of semantic Web technologies in our context but also contributes toward effective ways of translating a question in a researcher\u27s mind into precise computational queries with the intent of obtaining effective answers from the data. While researchers often assume the benefits of semantic Web technologies, we explicitly illustrate these in practice
    corecore